AITopics | molecule design

Collaborating Authors

molecule design

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Look the Other Way: Designing 'Positive' Molecules with Negative Data via Task Arithmetic

Özçelik, Rıza, de Ruiter, Sarah, Grisoni, Francesca

arXiv.org Artificial IntelligenceDec-9-2025

The scarcity of molecules with desirable properties (i.e., `positive' molecules) is an inherent bottleneck for generative molecule design. To sidestep such obstacle, here we propose molecular task arithmetic: training a model on diverse and abundant negative examples to learn 'property directions' - without accessing any positively labeled data - and moving models in the opposite property directions to generate positive molecules. When analyzed on 33 design experiments with distinct molecular entities (small molecules, proteins), model architectures, and scales, molecular task arithmetic generated more diverse and successful designs than models trained on positive molecules in general. Moreover, we employed molecular task arithmetic in dual-objective and few-shot design tasks. We find that molecular task arithmetic can consistently increase the diversity of designs while maintaining desirable complex design properties, such as good docking scores to a protein. With its simplicity, data efficiency, and performance, molecular task arithmetic bears the potential to become the de facto transfer learning strategy for de novo molecule design.

artificial intelligence, machine learning, molecule, (17 more...)

arXiv.org Artificial Intelligence

2507.17876

Country: Europe > Netherlands (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Constrained Graph Variational Autoencoders for Molecule Design

Neural Information Processing SystemsNov-20-2025, 22:53:12 GMT

Graphs are ubiquitous data structures for representing interactions between entities. With an emphasis on applications in chemistry, we explore the task of learning to generate graphs that conform to a distribution observed in training data. We propose a variational autoencoder model in which both encoder and decoder are graph-structured. Our decoder assumes a sequential ordering of graph extension steps and we discuss and analyze design choices that mitigate the potential downsides of this linearization. Experiments compare our approach with a wide range of baselines on the molecule generation task and show that our method is successful at matching the statistics of the original dataset on semantically important metrics. Furthermore, we show that by using appropriate shaping of the latent space, our model allows us to design molecules that are (locally) optimal in desired properties.

constrained graph variational autoencoder, molecule design, name change, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.83)

Add feedback

Molecule Design by Latent Prompt Transformer

Neural Information Processing SystemsMay-27-2025, 11:07:25 GMT

This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task, where target biological properties or desired chemical constraints serve as conditioning variables.We propose the Latent Prompt Transformer (LPT), a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution modeled by a neural transformation of Gaussian white noise; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt. LPT can be learned by maximum likelihood estimation on molecule-property pairs. During property optimization, the latent prompt is inferred from target properties and constraints through posterior sampling and then used to guide the autoregressive molecule generation.After initial training on existing molecules and their properties, we adopt an online learning algorithm to progressively shift the model distribution towards regions that support desired target properties. Experiments demonstrate that LPT not only effectively discovers useful molecules across single-objective, multi-objective, and structure-constrained optimization tasks, but also exhibits strong sample efficiency.

latent prompt transformer, molecule design, target property, (1 more...)

Neural Information Processing Systems

Country: Asia > China > Guangxi Province > Nanning (0.08)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.62)

Add feedback

ChatMol: A Versatile Molecule Designer Based on the Numerically Enhanced Large Language Model

Fan, Chuanliu, Cao, Ziqiang, Ma, Zicheng, Yu, Nan, Peng, Yimin, Zhang, Jun, Gao, Yiqin, Fu, Guohong

arXiv.org Artificial IntelligenceFeb-27-2025

Goal-oriented de novo molecule design, namely generating molecules with specific property or substructure constraints, is a crucial yet challenging task in drug discovery. Existing methods, such as Bayesian optimization and reinforcement learning, often require training multiple property predictors and struggle to incorporate substructure constraints. Inspired by the success of Large Language Models (LLMs) in text generation, we propose ChatMol, a novel approach that leverages LLMs for molecule design across diverse constraint settings. Initially, we crafted a molecule representation compatible with LLMs and validated its efficacy across multiple online LLMs. Afterwards, we developed specific prompts geared towards diverse constrained molecule generation tasks to further fine-tune current LLMs while integrating feedback learning derived from property prediction. Finally, to address the limitations of LLMs in numerical recognition, we referred to the position encoding method and incorporated additional encoding for numerical values within the prompt. Experimental results across single-property, substructure-property, and multi-property constrained tasks demonstrate that ChatMol consistently outperforms state-of-the-art baselines, including VAE and RL-based methods. Notably, in multi-objective binding affinity maximization task, ChatMol achieves a significantly lower KD value of 0.25 for the protein target ESR1, while maintaining the highest overall performance, surpassing previous methods by 4.76%. Meanwhile, with numerical enhancement, the Pearson correlation coefficient between the instructed property values and those of the generated molecules increased by up to 0.49. These findings highlight the potential of LLMs as a versatile framework for molecule generation, offering a promising alternative to traditional latent space and RL-based approaches.

molecule, representation, substructure, (12 more...)

arXiv.org Artificial Intelligence

2502.19794

Genre:

Research Report (1.00)
Overview (0.66)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reviews: Constrained Graph Variational Autoencoders for Molecule Design

Neural Information Processing SystemsOct-8-2024, 01:44:29 GMT

Summary: This paper describes a model for generating graph-structured data, with molecule generation being the example task. This model is based around a variational autoencoder whose encoder/decoder are designed to handle graph-structured data. The decoder builds a graph sequentially by starting from an arbitrary node and sampling edges to other nodes, which are placed in a queue; upon sampling an edge to a "stopping node," the next node is taken from the queue and the process continues until there are no more nodes to expand. The distributions from which these samples are taken are a function of the graph state (notably, not the specific steps taken to arrive at the current state), where the state vectors are encoded using a gated graph neural network (GGNN). Additionally, masking functions can be specified that serve as hard constraints on the sorts of edges that may be sampled (in case these would lead to graphs that are disallowed, e.g. that would lead to impossible molecules).

constrained graph variational autoencoder, experiment, molecule design, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)

Add feedback

Diagnosing and fixing common problems in Bayesian optimization for molecule design

Tripp, Austin, Hernández-Lobato, José Miguel

arXiv.org Machine LearningJun-11-2024

Bayesian optimization (BO) is a principled approach to molecular design tasks. In this paper we explain three pitfalls of BO which can cause poor empirical performance: an incorrect prior width, over-smoothing, and inadequate acquisition function maximization. We show that with these issues addressed, even a basic BO setup is able to achieve the highest overall performance on the PMO benchmark for molecule design (Gao et al, 2022). These results suggest that BO may benefit from more attention in the machine learning for molecules community.

bayesian optimization, molecule, molecule design, (16 more...)

arXiv.org Machine Learning

2406.07709

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Molecule Design by Latent Prompt Transformer

Kong, Deqian, Huang, Yuhao, Xie, Jianwen, Wu, Ying Nian

arXiv.org Machine LearningOct-4-2023

This paper proposes a latent prompt Transformer model for solving challenging optimization problems such as molecule design, where the goal is to find molecules with optimal values of a target chemical or biological property that can be computed by an existing software. Our proposed model consists of three components. (1) A latent vector whose prior distribution is modeled by a Unet transformation of a Gaussian white noise vector. (2) A molecule generation model that generates the string-based representation of molecule conditional on the latent vector in (1). We adopt the causal Transformer model that takes the latent vector in (1) as prompt. (3) A property prediction model that predicts the value of the target property of a molecule based on a non-linear regression on the latent vector in (1). We call the proposed model the latent prompt Transformer model. After initial training of the model on existing molecules and their property values, we then gradually shift the model distribution towards the region that supports desired values of the target property for the purpose of molecule design. Our experiments show that our proposed model achieves state of the art performances on several benchmark molecule design tasks.

artificial intelligence, machine learning, molecule, (15 more...)

arXiv.org Machine Learning

2310.03253

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Add feedback

ChemSpacE: Interpretable and Interactive Chemical Space Exploration

#artificialintelligenceOct-5-2022, 12:35:48 GMT

Discovering meaningful molecules in the vast combinatorial chemical space has been a long-standing challenge in many fields from materials science to drug discovery. Recent advances in machine learning, especially generative models, have made remarkable progress and demonstrate considerable promise for automated molecule design. Nevertheless, most molecule generative models remain black-box systems, whose utility is limited by a lack of interpretability and human participation in the generation process. In this work we propose Chemical Space Explorer (ChemSpacE), a simple yet effective method for exploring the chemical space with pre-trained deep generative models. It enables users to interact with existing generative models and inform the molecule generation process. We demonstrate the efficacy of ChemSpacE on the molecule optimization task and the molecule manipulation task in single property and multi-property settings.

chemspace, interactive chemical space exploration, molecule optimization task, (3 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.63)

Add feedback

Probabilistic Generative Transformer Language models for Generative Design of Molecules

Wei, Lai, Fu, Nihang, Song, Yuqi, Wang, Qian, Hu, Jianjun

arXiv.org Artificial IntelligenceSep-19-2022

Self-supervised neural language models have recently found wide applications in generative design of organic molecules and protein sequences as well as representation learning for downstream structure classification and functional prediction. However, most of the existing deep learning models for molecule design usually require a big dataset and have a black-box architecture, which makes it difficult to interpret their design logic. Here we propose Generative Molecular Transformer (GMTransformer), a probabilistic neural network model for generative design of molecules. Our model is built on the blank filling language model originally developed for text processing, which has demonstrated unique advantages in learning the "molecules grammars" with high-quality generation, interpretability, and data efficiency. Benchmarked on the MOSES datasets, our models achieve high novelty and Scaf compared to other baselines. The probabilistic generation steps have the potential in tinkering molecule design due to their capability of recommending how to modify existing molecules with explanation, guided by the learned implicit molecule chemistry. The source code and datasets can be accessed freely at https://github.com/usccolumbia/GMTransformer

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2209.09406

Country: North America > United States > South Carolina > Richland County > Columbia (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.66)

Technology: